Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Join operations are crucial in data analysis, but can suffer inefficiency with large datasets and complex non- equality-based conditions. Optimized join algorithms have gained traction in database research to address these challenges. One popular choice for implementing join algorithms is distributed data processing frameworks, e.g., Hadoop and Spark, but each implementation is highly tailored for specific query types. As a result, they do not address join queries that involve diverse and complex conditions since they are not integrated into a holistic query optimization engine like in DBMSs. On the other hand, implementing new join algorithms on a DBMS from scratch requires substantial effort and expertise. This paper introduces FUDJ, Flexible User-defined Distributed Joins, a framework for complex distributed join algorithms. The key idea of FUDJ is to allow developers to realize new distributed join algorithms into the database without delving into the database internals. As shown, an algorithm implemented in FUDJ is up to an order of magnitude faster than existing user-defined implementations with an order of magnitude fewer lines of code.more » « less
- 
            Join operations are crucial in data analysis, but can suffer inefficiency with large datasets and complex non-equality-based conditions. Optimized join algorithms have gained traction in database research to address these challenges. One popular choice for implementing join algorithms is distributed data processing frameworks, e.g., Hadoop and Spark, but each implementation is highly tailored for specific query types. As a result, they do not address join queries that involve diverse and complex conditions since they are not integrated into a holistic query optimization engine like in DBMSs. On the other hand, implementing new join algorithms on a DBMS from scratch requires substantial effort and expertise. This paper introduces FUDJ, Flexible User-defined Distributed Joins, a framework for complex distributed join algorithms. The key idea of FUDJ is to allow developers to realize new distributed join algorithms into the database without delving into the database internals. As shown, an algorithm implemented in FUDJ is up to an order of magnitude faster than existing user-defined implementations with an order of magnitude fewer lines of code.more » « less
- 
            null (Ed.)Modern visual data exploration systems are designed as client-server applications where the front-end interface generates a large number of queries to the back-end which are handled by a database server. As data exploration being a trial and error process, a significant amount of these queries return an empty result, which does not change the state of the visualization. These requests still add a significant overhead on network communication, request handling, and data processing. Moreover, given the virtually unlimited query space, it is impractical to enumerate and send all empty (or all non-empty) queries to the client to filter them. This paper introduces HQ-Filter, a hierarchy-aware filter for empty resulting queries, which utilizes the hierarchical nature of the data to construct a configurable and probabilistic filter. HQ-Filter can filter out empty-resulting queries at the client-side with a minimal size and processing overhead. HQ-Filter is applied to two existing data exploration systems for geospatial data, UCR-Star and Cloudberry. In both cases, it can successfully eliminate hundreds of queries per user which results in up-to 66% increase in server capacity by providing up to 15x speedup for average response time and up to 90% decrease in the server workload.more » « less
- 
            null (Ed.)The ever rising volume of geospatial data is undeniable. So is the need to explore and analyze these datasets. However, these datasets vary widely in their size, coverage, and accuracy. Therefore, users need to assess these aspects of the data to choose the right dataset to use in their analysis. Unfortunately, all the publicly available repositories for geospatial datasets provide a list of datasets with some information about them with no way to explore the datasets beforehand. Through this demonstration, we propose the repository, UCR-Star, that is capable of hosting hundreds of thousands of geospatial datasets that a user can explore visually to judge their quality before even downloading them. This demo provides a deeper dive into the core engine behind UCR-Star. It provides a web interface geared towards database researchers to understand how the index internally works. It provides a comparison interface where the attendees can see side-by-side how two versions of the system work with the ability to customize each of them separately. Finally, the interface reports the response time of the indexes for a quantitative comparison.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available